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Abstract 

In Monte Carlo integration an accurate and reliable deter¬ 
mination of the numerical integration error is essential. We 
point out the need for an independent estimate of the error 
on this error, for which we present an unbiased estimator. 
In contrast to the usual (first-order) error estimator, this 
second-order estimator can be shown to be not necessarily 
positive in an actual Monte Carlo computation. We propose 
an alternative and indicate how this can be computed in lin¬ 
ear time without risk of large rounding errors. In addition, 
we comment on the relatively very slow convergence of the 
second-order error estimate. 


1 Monte Carlo integration and its errors 

It does not need to be stressed that in numerical integration, including Monte 
Carlo (MC) integration [T], a determination or estimate of the integration 
error made is essential. The Central Limit Theorem (CLT) practically en¬ 
sures that if the number N of MC points is sufficiently large the numerical 
value of the MC integral - itself a stochastic variable - will have a Gaussian 
distribution around the true integral value, with a standard deviation that 
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can itself also be estimated: this is the first-order error. The results of MC 
integrations are therefore usually reported as 


”result” ± ’’error” 


with the understanding that the ’’error” value quoted is the Gaussian’s stan¬ 
dard deviation. In this way one can, for instance, assign confidence levels 
when comparing the integration result with a measurement. However, since 
the Gaussian distribution is quite steep, a modest change in the value of the 
error can change the confidence levels considerably. It is therefore preferable 
to also have a second-order error that estimates how well the first-order error 
was computed. The better way to report the result of a MC integration is 
then 

’’result” ± ^’’first-order error” ± ’’second-order error”^ . 

A first attempt to implement such a method was presented in [2]. However, in 
that paper no explicit form of the second-order error estimator was presented, 
nor were its numerical stability properties and its convergence behaviour 
discussed: also it was (wrongly) stated that the second-order error was the 
square root of the estimator, while it ought to be the fourth root. The present 
paper addresses and corrects these issues. In what follows we shall arrive at 
an estimator for the second-order error that, like the first-order one, can be 
evaluated in linear time i.e. at essentially no extra CPU cost. We shall also 
discuss several of its numerical aspects, and suggest an improvement. 

2 Error estimators 

We will start by defining some mathematical tools. We consider an integral 
over an integration region T of an integrand f(x), with x € T. We have at 
our disposal a set of MC integration points Xj ,j = 1, 2,..., N, assumed to 
be iicl (Independent, Identically Distributed) with a probability distribution 
P(x) in T. We define 



r 


( 1 ) 
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so that J\ — J dx f(x), the integral we want to compute. The numbers 
Wj = w(xj ) are called the weights of the points. We see that J p is nothing 
but the expectation value of w(x) p : 

(w p ) = J p . (2) 

Furthermore, we define the following multiple sums: 

N 

S Pi,P2,.., Pk = Y w h Plw h P2 . Wj k Pk ( 3 ) 

with the condition that the indices ji, 2 ,...,fc are all different. As an example, 
the sum S^i does not contain N 2 but N- = N 2 — N terms. The falling 
powers are defined by 

ATE = N\/(N - p)\ = N(N - 1)(N - 2) • • • (N — p + 1) . (4) 

The simple sums S p can be evaluated in linear time (that is, using N addi¬ 
tions), but a multiple sum S pi ^^ Pk needs time of the order N k . In calculating 
estimators we therefore want to use only simple sums. On the other hand, 
only the multiple sums have a simple expectation value: 

i^pi,P2,—,Pk) = Jp 1 Jp 2 ■ ■ ■ Jp k ■ (3) 

We can relate simple and multiple sums to one another by the following 
obvious rule: 


5, 


Pl,P2, 


■Xk^q 


Spi+q,P2,--;Pk A Spi,P2+q,..;Pk + ■ ■ ■ + pi,p 2 ,...,Pk+q 

T Sp 1 ,p2,...,pk,q ■ 


( 6 ) 


We are now ready to construct the various estimators, starting with the 
well-known MC formulae for clarity. For the integral we have 


E\ = 



(7) 


since (Ef) = Jp moreover we see that this estimator is unbiased. For the 
variance of E\ we have 


(■ E ?) ~ ( E ,) 1 


-pp (S 2 + 5i,i) - Ji 2 

= ( 8 ) 
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so that the appropriate estimator is 


S 2 Si,i 1 

N 2 N-N Nm 2 


S 2 = N S 2 — Si 2 


(9) 


The latter form is more suited to computation since it can be evaluated in 
linear time. From Eq.(l8l) we see that the first-order error, defined as E 2 1 ^ 2 
decreases as iV -1//2 , as is of course very well known. Moreover, the expected 
error is defined for all functions that are quadratically integrable, as is equally 
well known. 


The second-order error should have as its expectation value the variance 
of E 2 , which by the same methods as above can be shown to be 

(e 2 2 )-(E 2 ) 2 = dj (j t - i.h.h + 3J 2 2 - 4 (./,-Ji 2 ) 2 ) 

■ ( 10 ) 

We see that the second-order error, defined as E±^ decreases, for large N, 

as At -3 / 4 . Moreover we see that the second-order error is only meaningful for 
integrands that are at least quartically integrable. The appropriate unbiased 
estimator with the correct expectation value is 

E * = a5a 5 - 4S d) + wmr* 2 ' 

£4 = NS 4 -AS 3 S 1 + 3S 2 2 . ( 11 ) 

An important observation here concerns the asymptotic behaviour of the rel¬ 
ative errors. Whereas the relative first-order error, i.e. the ratio E 2 1 ^ 2 /Ei, 
goes as At -1 / 2 according to the ‘standard’ behaviour in MC, the relative 
second-order error E^ 1 ^/E 2 ^ 2 only decreases as fast as N - 1 / 4 . It will there¬ 
fore take much longer for the error to be well-determined than for the integral 
itsehQ. 

A final point is in order. By the CLT we know that the distribution of E\ 
in an ensemble of MC computations is normally distributed, which tells us 
the meaning of E 2 , as discussed above. Since E 2 is not computed as a simple 
average, its distribution is not governed by the same CLT. Nevertheless, 

4 Note that the relative errors as defined here are the dimensionless ratios, the only 
meaningful measures of performance of the computation. 
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as is shown in the Appendix a good case can be made for it being also 
approximately normally distributed, so that the relation between E ti and the 
confidence levels of E 2 can be treated in the usual manner. Below, we shall 
illustrate this with several examples. 


3 Positivity and numerical stability 

In principle, equations (J7|), (j9j) and (ITT]) are what is necessary to obtain the 
integral and its first- and second-order errors. However, a number of consid¬ 
erations must modify this picture. In the first place, the issue of positivity. 
Writing w(x) — J\ + u(x) so that / dx P(x) u(x ) = 0, we have 

J 2 — J\ 2 — J dx P(x) u(x) 2 , 

J 4 — 4 J 3 Ji + 3 J 2 2 = J dx P(x) u(x) A + 3 (/ dx P(x) u(x ) 2 ^j 
J 4 - 4J 3 J 4 + 3J 2 2 - 4 (j 2 - A 2 ) 2 = 

i J dxdy P(x)P(y)(u(x) 2 - u(y) 2 ) 2 , 

( 12 ) 


so that the expectation values of A 2j4 are positive, as they should. In addition, 
since with the notation Wj = Ei + Uj the S 2 can be written as 

s 2 = \ Y, Oi - u kf , (13) 

z j,k 


also E 2 itself is strictly nonnegative in any actual MC calculation. For E 4 
this does not hold, however. A counterexample can be constructed as follows. 
Let us assume that the MC weights Wj take on only the values 0 and 1, and 
that E\ = Nb , b e [0,1]. We then have 


S 2 = S 4 = N 2 a , a = b — b 2 e [0,1/4] 
The value of E 4 now comes out as 

4N 3 — 6 N 2 9 \ 

- a J ’ 


, _ ±f^z 

4 m \ n a 


(14) 


(15) 
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which is actually negative for 


(N - l ) 2 _ 1 N — 2 

a > iV(4iV - 6 ) ~ 4~ 2N(4N - 6 ) ' ( ' 16 ' ) 

Although by small margin (surprisingly, in this counterexample, for 6 « 1/2), 
the positivity of E 4 cannot be guaranteed, so that E 4 I 4 may be undefined. 
As an improvement on this situation we propose to abandon the estimator 
E 4 in favour of 

^ = V3V5 - 4E2 ') ' (17) 

This estimator has a slight (order 1/N) bias, which ought to be acceptable 
since we are dealing with only the second-order error here; its advantage is 
that, since 

iV 2 E4-4S 2 2 = ^^(u/-u fc 2 ) 2 , (18) 

Z j,k 

it always evaluates to a nonnegative number. 


The second issue is that of numerical stability. It is well known that 
already the evaluation of £ 2 involves large cancellations which may destroy 
the numerical stability of the calculation and can actually lead to negative 
values for E 2 : this is the reason why the straightforward computation of E 2 
usually cannot be reliably performed with single-precision arithmetic^. This 
problem has been widely discussed, for instance in [31 S] - The situation of £ 4 , 
which involves even larger cancellations, is certainly worse. To tackle these 
problems, we adopt the CGV algorithm first described in |4]. The strategy 
of this algorithm can best be summarized as follows. In the first place, one 
concentrates on objects that are supposed to go to a finite asymptotic value. 
E\ is such an object, but S 2j 4 are not. In the second place, the algorithm 
focuses on the update of these numbers as N is increased by 1. So let us 
define 

M(N) = S 4 (N)/N , 

P{N) = S 2 {N)/N - Si(N) 2 /N 2 , 

Q(N) = S 3 (N)/N — 3S2(N)Si(N)/N 2 + 2S\(N) 3 /N 3 , 

R(N) = S 4 (N)/N- 433^)3^)/^+ 3S 2 (N) 2 /N 2 -4P(N) 2 .(19) 

5 As anyone who has ever taught courses on Monte Carlo integration can testify. 
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Here we have explicitly indicated the N dependence of the running sums 
Si, 2 , 3 , 4 - We also define 

m = M(N — 1) , p = P(N— 1) , q = Q(N — 1) , u = w N — m . (20) 
The authors of pE] have already established the update rules 

M(N) — m + , 

, N N - 1 / 1 9 \ 

P[N) = — ( P+ N U ) ■ (21) 

We see that in particular the computation of P(N) is free of large cancella¬ 
tions. Some algebra leads us to supplement these update rules by 

Q(N) 

R(N) 

( 22 ) 


TV- 1 
N 

N - 1 
N 


N — 2 n 3 p 

q + ^ U ~N U 


R{N-1) + 


N 


P~ 


N -2 


N 


-u 


-4 


Q P 2 
N U N 2U 


Using these results, for any given N we then have 

E 2 = ^P(JV) , E 4 = ^gR(N) ■ (23) 

4 A case study 

To illustrate all the above, we can perform a simple but enlightening study. 
Let us consider the following class of integrands: 

f a (x) = (1 + a) x a , x G (0,1] , — 1 < a < 0 , (24) 

which we shall integrate by employing N pseudorandom numbers, iid uni¬ 
formly in (0,1]. These functions are all integrable (with J\ — 1), but diver¬ 
gent as x — » 0. For a < —0.5 they are not quadratically integrable, and for 
a < —0.25 they are quadratically integrable but not quartically integrable. 
Consequently, for a < —0.25 the expectation value (-U 4 ) is not defined, and 
for a < —0.5 not even (U 2 ) is defined. Nevertheless, in any actual MC 
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calculation of this integral, £ 2 ,4 and E 2 , E 4 will have definite, well-defined 
numerical values. So how, then, are these to be interpreted? 

Below, we give the results for E 2 1 ^ 2 and E^ A in a MC run where N < 10 4 , 
monitoring their behaviour while N increases. This we do for values of a 
r unning from —0.1 down to —0.9. 



The upper line is the evolution of E 2 1 ^ 2 , and the lower line displays E X 4 A . 
For the smoothest case, a = —0.1, the IV -1 behaviour for E 2 and the iV -3 
behaviour for E 4 are evident^, marred by smallish jumps whenever an x 
value close to the singularity at x = 0 is encountered. As a decreases to 
-0.3 quartic integrability is lost, which can be seen from the fact that the 
jumps in E 4 are now much larger while those in E 2 remain modest. Note 
that in all cases exactly the same set of pseudrandom numbers was used. 
Therefore in the various plots the jumps are in the same place, they simply 
become larger and larger. For a = —0.6 where the integrand is also no 


6 Note that in the plots the values given are those of E ^' 2 and E\^. 























longer quadratically integrable even the iV -1 behaviour of £ 2 becomes quite 
distorted by the growing jumps. Finally, at a = — 0.9 where the function 
itself is barely integrable, the jumps have become so large that the short¬ 
term iV -1 and N~ 3 behaviour inbetween the jumps can no longer ensure 
this behaviour over longer N ranges. It is this kind of behaviour — short- 
range smooth decrease interspersed with (for increasing singularness of the 
integrand) increasingly large local jumps — that ruins the usefulness of £4, 
then £ 2 , and, for non-integrable functions, finally even E\. 

From this excercise we conclude that it should always be a good idea, in 
any MC calculation, to monitor the behaviour of £ 2 and £4 as N increases; 
and that this may tell us whether the second-order error, or indeed even 
the first-order error itself, can be assigned any useful meaning. It should be 
pointed out that, in our case study, the jumps in £4 are typically larger that 
those in E 2 and that £4 is therefore a more sensitive probe of possible conver¬ 
gence problems; and, independently of that, an estimate of how accurately 
the integration error itself is estimated is in our opinion always adviseable. 


Conclusions 

We have argued that the current practice of MC integration, resulting in 
a report on the integral estimate and its error estimate, should always be 
accompanied by a second-order error estimate, if only to validate the assign¬ 
ment of confidence levels to the result (which can be, for instance, crucial in 
comparing the results of different MC calculations, which is good and com¬ 
mon practice). We have presented the relevant estimators. A closer look at 
£4 shows potential positivitiy problems and we have emended this by defin¬ 
ing an improved estimator £4. We also point out that, on the one hand, the 
convergence of the second-order error, E\^/E 2 1 ^ 2 ~ N ~ 1,/4 , rather than the 
‘well-known’ £ 2 1//2 /£i N h 2 convergence of the error itself, and that on 
the other hand E 2 satisfies its own version of the central-limit theorem. In 
addition, we have extended the methods of the Chan-Golub-Leveque algo¬ 
rithm [3j to allow for a numerically stable computation of not only £ 2 but 
£4 as well. 
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Appendix 


In this Appendix we will argue that the values of £2 obey their own version 
of CLT. This is not automatically obvious, since we can write 


N 


V 2 = N'£(w j -M(N)y 

3= 1 


(25) 


and therefore the summed quantities are not independent of one another. Let 
us therefore consider a number of MC weights uij, j = 1,2,,N, that are 
identically distributed with probability density P(w) but under the constraint 
that 


N 


We define 


E w o = 0 • 

3 = 1 


1 N 

x = jf^i 2 . 

JV 3 =1 


(26) 


(27) 


and estimate the distribution of X for large N as follows. The moment¬ 
generating function of X reads 

(e lzX ' K j oc J du dw 1 • • • dwN -P(uq) • • • P(wn) exp (iu ^ Wj + i— ^ 

= J du jy dw P(w) exp (iuw + i— w 2 ^j j , (28) 

where the integrals run from — 00 to +00. Introducing 

$ k {u) = [ dw P{w) e iuw w k (29) 


we can estimate 


J dw P(w) exp (iuw + ut 


N 


= exp L N log ($<>(«) +ij^® 2 (u) - <S> 4 (u) + O 


$o(«) exp ^izA(u) - — t(u)J , 

A(u) = $ 2 (u)/$o(u) , t(u) = <L 4 (u)/<I>o(m) • 


(30) 

(31) 
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Now, since d>o(0) = 1 is the absolute maximum of an d 

<l>o(w) = 1 + iu (w) - — (w 2 ) + 0(u 3 ) , (32) 

we can estimate 

|$ 0 ( M )| 2 = 1 — u 2 a 2 + 0(u 4 ) , a 2 = (w 2 ^ — (w) 2 , (33) 

so that we may approximate 

(34) 

and the u integral is dominated by the values of u around zero; consequently, 

(e“) « exp Ga( 0) - A r ( 0 )f (35) 

and the probability density for A" to take on the value x is 

Pr(A = z) oc ex P A (°)) 2 ) > 

A(0) = (w 2 ^ , t(0) = ~ ( u ' 2 ) • (36) 

We see that in this sense a CLT holds for the distribution of Yl w j 2 - 

As an illustration, we generate a large number (10 6 ) of samples of N 
(pseudo-)random numbers uniformly in the interval [0,1], and compute for 
these -Ei, 2 , 4 - Below, we give the actual distribution of the E\ values together 
with the CLT Gaussian approximation with a width given by E 2 1 ^ 2 . Similarly, 
we also give the actual distribution of the E 2 values with their CLT Gaussian 
approximation with width E ^ 2 . We do this both for N = 10 and for N = 
1000 . 
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Unsurprisingly, for N = 1000 the CLT approximation is excellent, but for 
N — 10 it is evdident that the approximation is much worse for E 2 than for 
Ei. 

We repeat the same excercise for numbers that are exponentially dis¬ 
tributed, that is, with probability density P{x) = exp(— x), x G [0, oo). 





Because of the long high-a: tail of this P(x), the CLT approximation is ap¬ 
preciably worse for N = 10 although still very good for N = 1000. 

Finally, we consider numbers distributed according to the exponential 
integral [5]: 

OO 

P(x) — Ei(x) = J dt-j- j x e (0, oo) , (37) 

X 
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which looks like e~ x for large x, and like — log(a;) for x close to zero. Such a 
distribution, with both many low-x values and a high-a; tail, is typical for how 
weights arising from MC event generators in particle physics are distributed. 





The CLT approximation is, unsurprisingly, very poor for N = 10. However, 
for large N values it is still seen to be quite good, where we must recall that 
N = 1000 is actually quite a small number for any serious calculation. 
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